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ABSTRACT 

Despite the identification of horseshoe bats as the reservoir of SARS-related- 
coronaviruses (SARSr-CoVs), the origin of SARS-CoV ORF8, which contains the 29-nt 
signature deletion among human strains, remains obscure. Although two SARSr-Rs- 
BatCoVs, RsSHC014 and Rs3367, previously detected from Chinese horseshoe bats 
(Rhinolophus sinicus) in Yunnan, possessed 95% genome identities to human/civet 
SARSr-CoVs, their ORF8 exhibited only 32.2-33% aa identities to that of human/civet 
SARSr-CoVs. To elucidate the origin of SARS-CoV ORF8, we sampled 348 bats of 
various species in Yunnan, among which diverse alphacoronaviruses and 
betacoronaviruses, including potentially novel CoVs, were identified, with some showing 
potential interspecies transmission. The genomes of two betacoronaviruses, SARSr-Rf- 
BatCoV YNLF 31C and YNLF 34C, from greater horseshoe bats (Rhinolophus 
ferrumequinum), possessed 93% nt identities to human/civet SARSr-CoV genomes. 
Although they displayed lower similarities to civet SARSr-CoVs than SARSr-Rs- 
BatCoV RsSHC014 and Rs3367 in S protein, their ORF8 demonstrated exceptionally 
high (80.4-81.3%) aa identities to that of human/civet SARSr-CoVs, compared to 
SARSr-BatCoVs from other horseshoe bats (23.2-37.3%). Potential recombination events 
were identified around ORF8 between SARSr-Rf-BatCoVs and SARSr-Rs-BatCoVs, 
leading to the generation of civet SARSr-CoVs. The expression of ORF8 subgenomic 
mRNA suggested that this protein may be functional in SARSr-Rf-BatCoVs. The high 
Ka/Ks ratio among human SARS-CoVs compared to SARSr-BatCoVs supported that 
ORF®8 is under strong positive selection during animal-to-human transmission. Molecular 


clock analysis using ORFlab showed that SARSs-Rf-BatCoV YNLF 31C and 


53 YNLF_34C diverged from civet/human SARSr-CoVs at approximately 1990. SARS- 
54 CoV ORF8 is originated from SARSr-CoVs of greater horseshoe bats through 


55 recombination, which may be important for animal-to-human transmission. 
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IMPORTANCE 

Although horseshoe bats are the primary reservoir of SARS-related-coronaviruses 
(SARSr-CoVs), it is still unclear how these bat viruses have evolved to cross the species 
barrier to infect civet/human. Most human SARS-CoV epidemic strains contained a 
signature 29-nt deletion in ORF8 compared to civet SARSr-CoVs, suggesting that ORF8 
may be important for interspecies transmission. However, the origin of SARS-CoV ORF8 
remains obscure. In particular, SARSr-Rs-BatCoVs from Chinese horseshoe bats 
exhibited <40% aa identities to human/civet SARS-CoV in ORF8. We detected diverse 
alphacoronaviruses and betacoronaviruses among various bat species in Yunnan, 
including two SARSr-Rf-BatCoVs from greater horseshoe bats that possessed an ORF8 
with exceptionally high aa identities to that of human/civet SARSr-CoVs. We 
demonstrated recombination events around ORF8 between SARSr-Rf-BatCoVs and 
SARSr-Rs-BatCoVs, leading to the generation of civet SARSr-CoVs. Our findings offer 
insight into the evolutionary origin of SARS-CoV ORF8 which was likely acquired from 


SARSr-CoVs of greater horseshoe bats through recombination. 
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INTRODUCTION 

Coronaviruses (CoVs) are known to cause respiratory, enteric, hepatic and neurological 
diseases of varying severity in a variety of animals. They are currently classified into four 
genera, Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus, 
replacing the traditional three groups, group | to 3 (1-4). The genus Betacoronavirus is 
further classified into lineages A to D (3, 5, 6). Among CoVs that infect humans, human 
CoV 229F (HCoV 229E) and human CoV NL63 (HCoV NL63) belong to 
Alphacoronavirus; human CoV OC43 (HCoV OC43) and human CoV HKU1 (HCoV 
HKU1) belong to Betacoronavirus lineage A; Severe Acute Respiratory Syndrome- 
related CoV (SARSr-CoV) belongs to Betacoronavirus lineage B; and the recently 
emerged Middle East Respiratory Syndrome CoV (MERS-CoV) belongs to 
Betacoronavirus lineage C (7-16). The high recombination rate, coupled with the 
infidelity of the RNA-dependent RNA polymerase (RdRp), may have facilitated CoVs to 
adapt to new hosts and ecological niches, causing epidemics in animals and humans (5, 
17-24). 

The SARS epidemic and identification of SARSr-CoVs from palm civet and 
horseshoe bats in China have boosted interests in the discovery of novel CoVs in both 
humans and animals especially bats (25-29). With the exception of lineage A 
betacoronaviruses, bats are now known to be an important reservoir of diverse 
alphacoronaviruses and lineage B, C and D betacoronaviruses (30-38), with bat CoVs 
being the gene source for other mammalian CoVs (4). In particular, the findings of bat 
CoVs related to SARS-CoV and MERS-CoV suggested that bats may be the animal 


origin of both SARS and MERS epidemics; while other animals have served as the 


0) 
= 
6 
TO 

(0) 

— 

(72) 

O 
jae 

— 

(OL 
= 

O 

72) 

=) 

(= 
= 
TO 

v 

(ok 

(0) 

O 

O 
~ 


Journal of Virology 


Journal of Virology 


96 


97 


98 


99 


100 


101 


102 


103 


104 


105 


106 


107 


108 


109 


110 


111 


112 


113 


114 


115 


116 


117 


118 


intermediate or amplifying hosts for animal-to-human transmission, palm civets in the 
case of SARS and dromedary camels in MERS (25, 27, 28, 39-41). However, the 
evolutionary paths from bat CoVs to CoVs capable of infecting intermediate hosts and 
humans are not fully understood. 

SARSr-CoVs have been detected in at least 11 different species of horseshoe bats 
(genus Rhinolophus) from various countries in Asia, Africa and Europe (27, 28, 35, 37, 
38, 42, 43). Related viruses have also been reported in bats of other genera, such as 
Chaerophon and Hipposideros, from Africa and China (43-45). However, it is still 
unclear how these bat CoVs have evolved to generate the ancestor of civet/human 
SARSr-CoVs capable of crossing the species barrier. The genome organization of 
SARSr-CoVs, similar to other CoVs, possessed the characteristic gene order 5’-open 
reading frame lab (ORF lab), spike (S), ORF3, envelope (E), membrane (M), ORF 6 to 8, 
nucleocapsid (N)-3’. It is known that most human SARS-CoVs during the epidemic 
contained a signature 29-nt deletion in ORF8 compared to civet SARSr-CoVs (25), 
suggesting that this genomic region may be important for interspecies transmission. 
However, the origin of SARS-CoV ORF8 remains obscure. Genomes of SARS-related 
Rhinolophus sinicus BatCoVs (SARSr-Rs-BatCoVs), previously designated SARSr-Rh- 
BatCoVs, from Chinese horseshoe bats (Rhinolophus sinicus) in Hong Kong and the 
Guangdong Province only shared 87-92% nucleotide (nt) identities to human/civet 
SARSr-CoV genomes (22, 27, 28). A subsequent study identified two SARSr-Rs- 
BatCoVs, RsSHC014 and Rs3367, in the Yunnan Province, which were more closely 
related to human/civet SARSr-CoVs (with 95% genome sequence identities) than any 


other SARSr-BatCoVs (42). The S proteins of these two SARSr-Rs-BatCoVs from 
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Yunnan shared 90.1-92.3% amino acid (aa) identities to those of human/civet SARSr- 
CoVs, compared to 79-80% aa identities between SARSr-Rs-BatCoVs from Hong Kong 
and human/civet SARSr-CoVs (27, 42). Moreover, a highly similar virus, SARSr-Rs- 
BatCoV WIV1, isolated in Vero E6 cells, was able to use angiotensin converting enzyme 
II (ACE2) from humans, civets, and Chinese horseshoe bats as receptor for cell entry, 
suggesting that intermediate hosts between bats and human/civets may not be necessary 
for interspecies transmission (42). However, considerable genetic distance still exists 
between the two SARSr-Rs-BatCoVs from Yunnan and human/civet SARSr-CoVs, 
especially in the ORF8 region with only 32.2-33% aa identities. 

To elucidate the evolutionary origin of SARS-CoV ORF8 and search for even 
closer bat CoV ancestors of SARS-CoV, we conducted a three-month study (May to July 
2013) on CoVs among various bats from different regions of the Yunnan Province. 
Diverse CoVs were detected, including two SARS-related Rhinolophus ferrumequinum 
BatCoVs (SARSr-Rf-BatCoVs) from greater horseshoe bats (Rhinolophus 
ferrumequinum), which possessed an expressed ORF8 much more closely related to 
human/civet SARSr-CoVs than CoVs detected from other bat species. Recombination 
and molecular clock analysis were also performed to elucidate the evolutionary paths and 


time of interspecies transmission of SARSr-CoVs. 
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MATERIALS AND METHODS 

Ethics statement. The collection of bat samples was approved and performed by the 
Yunnan Institute of Endemic Diseases Control and Prevention, Dali, Yunnan, China. All 
bats were maintained and handled using standard procedures approved by the Medical 
Ethical Committee of Yunnan Institute of Endemic Diseases Control and Prevention, 
China. 

Sample collection. Bats were captured from various locations in five counties of 
four prefectures of the Yunnan Province, China from May to July 2013 (Fig. 1). Samples 
were collected using procedures described previously (27, 46). All samples were placed 
in viral transport medium (Earle’s balanced salt solution, 0.09% glucose, 0.03% sodium 
bicarbonate, 0.45% bovine serum albumin, 50 mg/ml amikacin, 50 mg/ml vancomycin, 
40 U/ml nystatin) and stored at -80°C before RNA extraction. 

RNA extraction. Viral RNA was extracted from alimentary samples using 
QIAamp Viral RNA Mini Kit (QIAgen, Hilden, Germany). The RNA was eluted in 50 ul 
of AVE buffer and was used as the template for RT-PCR. 

RT-PCR for CoVs and DNA sequencing. CoVs screening was performed by 
amplifying a 440-bp fragment of the RdRp gene of CoVs using conserved primers (5’- 
GGTTGGGACTATCCTAAGTGTGA-3’ and 5’- 
ACCATCATCNGANARDATCATNA-3’) targeted to RdRp genes of CoVs (12). 
Reverse transcription was performed using the SuperScript II kit (Invitrogen, Life 
Technologies, Grand Island, NY, USA). The PCR mixture (25 wl) contained cDNA, PCR 
buffer (10 mM Tris-HCl pH 8.3, 50 mM KCl, 3 mM MgCl and 0.01% gelatin), 200 uM 


of each dNTPs and 1.0 U Taq polymerase (Applied Biosystems, Life Technologies, 
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Grand Island, NY, USA). The mixtures were amplified in 40 cycles of 94°C for 1 min, 
48°C for 1 min and 72°C for 1 min and a final extension at 72°C for 10 min in an 
automated thermal cycler (Applied Biosystems). Standard precautions were taken to 
avoid PCR contamination and no false-positive was observed in negative controls. 

The PCR products were gel-purified using the QIAquick gel extraction kit 
(QIAgen). Both strands of the PCR products were sequenced twice with an ABI Prism 
3700 DNA Analyzer (Applied Biosystems), using the two PCR primers. The sequences 
of the PCR products were compared with known sequences of the RdRp genes of CoVs 
in the GenBank database. Phylogenetic tree was constructed using the 266-bp fragments 
of the RdRp gene with maximum likelihood method using substitution model of General 
Time Reversible model with Gamma Distribution as well as allowance of evolutionarily 
invariable sites (GTR+G+I) by MEGA 5.0 (47). 

Viral culture. The two samples positive for SARSr-Rf-BatCoVs were subject to 
virus isolation in Vero E6 (African green monkey kidney) and primary R. sinicus lung 
cells as described previously (21). 

Complete genome sequencing and analysis of SARSr-Rf-BatCoVs. Two 
complete genomes of SARSr-Rf-BatCoVs were amplified and sequenced using RNA 
extracted from the alimentary samples as templates. RNA was converted to cDNA by a 
combined random-priming and oligo(dT) priming strategy. The cDNA was amplified by 
degenerate primers as described previously (27). A total of 75 sets of primers, available 
on request, were used for PCR. The 5’ end of the viral genome was confirmed by rapid 
amplification of cDNA ends using the 5'/3' SMARTer'™™ RACE cDNA Amplification Kit 


(Clontech, USA). Sequences were assembled and manually edited to produce the final 
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sequences. The nt sequences of the genomes and the deduced aa sequences of the ORFs 
were compared to those of other CoVs using the coronavirus database CoVDB (48). 
Phylogenetic tree construction was performed using maximum likelihood method with 
MEGA 6.0. 

Recombination analysis. To detect possible recombination between different 
SARSr-BatCoVs and civet SARSr-CoVs, sliding window analysis was performed using 
nt alignment of the available genome sequences generated by ClustalX version 1.83 and 
edited manually with BioEdit version 7.1.3. Similarity Plot analysis and Bootscan 
analysis were performed using Simplot version 3.5.1 (49) (F84 model; window size, 1000 
bp; step, 200 bp) with civet SARSr-CoV SZ3 as query. 

Estimation of synonymous and non-synonymous substitution rates. The 
number of synonymous substitutions per synonymous site, Ks, and the number of non- 
synonymous substitutions per non-synonymous site, Ka, for each coding region were 
calculated for all available SARSr-Rf-BatCoV, SARSr-Rs-BatCoV, civet SARSr-CoV 
and human SARSr-CoV genomes using the Nei-Gojobori method (Jukes-Cantor) in 
MEGA 5.0. 

Estimation of divergence dates. The tMRCA was estimated based on an 
alignment of ORF lab and nsp5 sequences, using the Uncorrelated exponential distributed 
relaxed clock model (UCED) in BEAST version 1.8 (http://evolve.zoo.ox.ac.uk/beast/) 
(50). Under this model, the rates were allowed to vary at each branch drawn 
independently from an exponential distribution. The sampling dates of all strains were 
collected from the literature or from the present study, and were used as calibration points. 


Depending on the data set, Markov chain Monte Carlo (MCMC) sample chains were run 
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for 2 x 10° states, sampling every 1,000 generations under the GTR nt substitution model, 
determined by MODELTEST and allowing y-rate heterogeneity for all data sets. A 
constant population coalescent prior was assumed for all data sets. The median and HPD 
were calculated for each of these parameters from two identical but independent MCMC 
chains using TRACER 1.3 (http://beast.bio.ed.ac.uk). The tree was annotated by 
TreeAnnotator, a program oof BEAST and_ displayed by  FigTree 


(http://tree.bio.ed.ac.uk/software/figtree/). 


Expression of ORF8 and determination of leader-body junction sequence. 
The leader-body junction site and flanking sequences of the ORF8 subgenomic mRNA in 
SARSr-Rf-BatCoV YNLF_31C were determined using RT-PCR as described previously 
(21, 51). Briefly, RNA was extracted directly from the bat samples using TRIzol Reagent 
(Invitrogen). Reverse transcription was performed using random hexamers and the 
SuperScript II kit (Invitrogen). cDNA was PCR amplified with a forward primer (5’- 
CTACCCAGGAAAAGCCAAC-3’) located in the leader sequence and a reverse primer 
(5’-TGAACCATAGTGTGCCATCT-3’) located in the body of the ORF8 mRNA. The 
PCR mixture (25 ul) contained cDNA, PCR buffer (10 mM Tris-HCl pH 8.3, 50 mM KCI, 
2 mM MgCl and 0.01% gelatin), 200 UM of each dNTPs and 1.0 U Taq polymerase 
(Applied Biosystems). The mixtures were amplified in 60 cycles of 94°C for 1 min, 50°C 
for | min and 72°C for | min and a final extension at 72°C for 10 min in an automated 
thermal cycler (Applied Biosystems). RT-PCR products were subject to agarose gel 
electrophoresis gel-purified using QIAquick gel extraction kit (QlAgen) and sequenced to 


obtain the leader-body junction sequences for the ORF8 subgenomic mRNA. 


12 


228 Nucleotide sequence accession numbers. The nt and genome sequences of the 
229  CoVs detected in this study have been lodged within the GenBank sequence database 


230 under accession no. KP886808, KP886809, and KP895482 to KP895525. 
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RESULTS 

Detection of CoVs in bats. A total of 348 alimentary samples from bats belonging to 
five different genera were obtained from various regions of the Yunnan province. RT- 
PCR for a 440-bp fragment of the RdRp gene of CoVs was positive in alimentary 
samples from 46 bats of five species belonging to four genera (Table 1, Fig. 1). Sequence 
analysis showed that 35 samples contained diverse alphacoronaviruses, while 11 samples 
contained betacoronaviruses, including two lineage B_ betacoronaviruses and nine 
lineage D betacoronaviruses. 

Detection of diverse bat alphacoronaviruses. Phylogenetic analysis of the 440- 
bp fragments of the RdRp gene of alphacoronaviruses detected in 35 bat samples showed 
that two sequences from one Rhinolphus stheno and one Myotis daubentonii captured in 
Mojiang possessed 92-93% nt identities to Rhinolophus bat CoV HKU2 (Rh-BatCoV 
HKU2) (GenBank accession no. NC_009988.1) (Table 1, Fig. 2). Four sequences from M. 
daubentonii in Xiangyun possessed 81% nt identity to Rh-BatCoV HKU2 (GenBank 
accession no. NC_009988.1). Twenty-four sequences from M. daubentonii in Xiangyun 
possessed 78-99% nt identities to Myotis bat CoV HKU6 (My-BatCoV HKU6) (GenBank 
accession no. DQ249224.1). Two sequences from M. daubentonii in Mojiang possessed 
96% nt identities to Miniopterus bat CoV HKU7 (Mi-BatCoV HKU7) (GenBank 
accession no. DQ249226.1). One sequence from M. daubentonii in Mojiang possessed 
96% nt identities to Miniopterus bat CoV HKU8 (GenBank accession no. NC_010438.1). 
Two sequences from Hipposideros Pomona in Mojiang possessed 81-87% nt identities to 
Hipposideros bat CoV HKU10 (Hi-BatCoV HKUI10) (GenBank accession no. 


JQ989267.1). 
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Detection of lineage B and D bat betacoronaviruses. Phylogenetic analysis of 
the 440-bp fragments of the RdRp gene of betacoronaviruses detected in two bat samples, 
YNLF_31C and YNLF_34C, showed that they belonged to Betacoronavirus lineage B, 
with 100% nt identities to human SARS-CoV TOR2 (GenBank accession no. 
AY274119.3) and 90% nt identities to SARSr-Rs-BatCoV HKU3 (GenBank accession no. 
DQ022305), thus representing SARSr-Rf-BatCoVs (Table 1, Fig. 2). Both samples were 
collected from greater horseshoe bats (Rhinolophus ferrumequinum) captured in Lufeng 
County, Chuxiong Yi Autonomous Prefecture (Fig. 1). Phylogenetic analysis of the 440- 
bp fragments of the RdRp gene of betacoronaviruses detected in nine other bat samples 
showed that they belonged to Betacoronavirus lineage D, with 75-79% nt identities to 
Rousettus bat coronavirus HKU9 (Ro-BatCoV HKU9) (GenBank accession no. 
NC_009021.1). These nine samples were collected from Leschenault’s rousettes 
(Rousettus leschenaulti) in Mengla County, Xishuangbanna Dai Autonomous Prefecture. 
Attempts to passage SARSr-Rs-BatCoV YNLF_31C and YNLF_34C in various cell lines 
were not successful, with no cytopathic effect or viral replication being detected. 

Genome comparison between SARSr-Rf-BatCoV and other SARSr-CoVs. 
The complete genome sequences of the two SARSr-Rf-BatCoV strains, YNLF_31C and 
YNLF_34C, were obtained by assembly of the sequences of RT-PCR products obtained 
directly from alimentary samples. Their genome sizes were 29723 bases, with G + C 
content 40.7%, comparable to those of most other SARSr-CoVs (27, 28). They were 
closely related to each other with 99.9% overall nt identities, while they possessed 88.2% 
nt identities to the genomes of SARSr-Rs-BatCoV HKU3 and 93% nt identities to the 


genomes of human/civet SARSr-CoVs. SARSr-Rf-BatCoV strains share similar genome 
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organization with other SARSr-CoV strains, containing the putative transcription 
regulatory sequence (TRS) motif, 5’°-ACGAAC-3’, at the 3’ end of the 5’ leader sequence 
and preceding each ORF except ORF 7b. Similar to most other SARSr-BatCoVs, SARSr- 
Rf-BatCoV YNLF_31C and YNLF _34C contained a single long ORF8. 

The nsp3, S, ORF3 and ORF8 regions are known to be the most rapidly evolving 
regions among SARSr-CoV genomes (27, 28, 52, 53). Pairwise comparison of aa 
sequences between civet SARSr-CoV SZ3 and other SARSr-CoVs showed that the S and 
ORF3a of SARSr-Rf-BatCoV YNLF_31C and YNLF_34C displayed relatively low 
sequence identities to civet SARSr-CoV (Table 2). However, the nsp3 of SARSr-Rf- 
BatCoV YNLF_31C and YNLF 34C exhibited 97.1% aa identity to civet SARSr-CoV, 
which is comparable to the high sequence identity of 96.8 to 97.5% between civet 
SARSr-CoV and SARSr-BatCoVs, Rs3367, RsSHC014, WIV1 and BtCoV-Cp/2011, 
from Yunnan reported previously (42). Furthermore, an exceptionally high sequence 
identity (80.4-81.3% aa identity) was observed in the ORF8 between SARSr-Rf-BatCoVs 
and human/civet SARSr-CoVs, much higher than that between human/civet SARSr- 
CoVs and other SARSr-BatCoVs (23.2-37.3% aa identity). Therefore, civet SARSr-CoV 
SZ3 was most closely related to SARSr-Rs-BatCoV Rs3367 and WIV1 in S and ORF3a, 
but was most closely related to SARSr-Rf-BatCoVs in ORF8. 

The predicted receptor binding domain (RBD) of SARSr-Rf-BatCoV YNLF_31C 
and YNLF _34C possessed 89% and 68.1% aa identities to that of SARSr-Rs-BatCoV 
HKU3-1 and civet SARSr-CoV SZ3 respectively. Previous studies have identified five 
critical residues (residues 442, 472, 479, 487 and 491) for ACE2 binding in human and 


civet SARSr-CoVs (54). In particular, residues 479 and 487 are the two key residues that 
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are different between human and civet SARSr-CoV strains, with S—>T substitution at 
residue 487 resulting in 20-fold reduction in human ACE2 binding affinity (54). In 
SARSr-Rs-BatCoV Rs3367, two (residues 479 and 491) of the five critical residues were 
conserved. In SARSr-Rf-BatCoVs and most other SARSr-Rs-BatCoVs, only residue 491 
was conserved (Fig. 3). Compared to human/civet SARSr-CoVs and SARSr-Rs-BatCoV 
Rs3367, WIV1 and RsSHC014, the RBD of SARSr-Rf-BatCoV YNLF_31C and 
YNLF_34C, similar to some SARSr-BatCoV strains, contained two deletions of 5 aa and 
12 aa respectively. 

Phylogenetic analysis. Phylogenetic trees were constructed using nsp2, nsp3, 
nsp5, nsp12 (RdRp), S, ORF3a, ORF8 and N of SARSr-Rf-BatCoV YNLF 31C and 
YNLF_34C and other SARSr-CoVs (Fig. 4). These regions were selected because they 
were commonly used in phylogenetic analysis of CoVs (RdRp, S, N), represent regions 
of rapid evolution in SARSr-CoVs (nsp3, ORF3, ORF8), or free from recombination 
upon subsequent analysis (nsp2, nsp5). In nsp2, nsp3, nsp5, RdRp, and N genes, SARSr- 
Rf-BatCoV YNLF_31C and YNLF 34C were more closely related to other SARSr- 
BatCoVs than to two other SARSr-Rf-BatCoV strains, Rfl and BtCoV/273/2005, 
previously detected from greater horseshoe bats in Hubei (28, 37). However, in S, ORF3 
and ORF8, SARSr-Rf-BatCoV YNLF_31C and YNLF_34C were most closely related to 
SARSr-Rf-BatCoV Rfl and BtCoV/273/2005, forming a distinct cluster among other 
SARSr-BatCoVs. 

In S and ORF3 region, human/civet SARSr-CoVs were most closely related to 
SARSr-Rs-BatCoV Rs3367, WIV1 and RsSHCO014 previously detected from Yunnan 


bats (42). This is in line with the ability of SARSr-Rs-BatCoV WIV1 to replicate in 
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VeroE6 cells and use ACE2 as receptor (42). In nsp3, human/civet SARSr-CoVs were 
most closely related to SARSr-Rf-BatCoV YNLF 31C and YNLF 34C as well as 
SARSr-Rs-BatCoV Rs3367, WIV1 and RsSHC014. Furthermore, in ORF8, SARSr-Rf- 
BatCoV strains were clustered with human and civet SARSr-CoV strains with high 
bootstrap value of 990, whereas all SARSr-Rs-BatCoV strains, including Rs3367, WIV1 
and RsSHC014, formed another cluster. This concurs with results from pairwise aa 
sequence comparison, and suggests that the ORF8 of civet and human SARSr-CoV was 
originated from SARSr-Rf-BatCoVs from greater horseshoe bats instead of SARSr-Rs- 
BatCoV from Chinese horseshoe bats. 

Recombination analysis. Since the ORF8 of SARSr-Rf-BatCoVs showed high 
sequence identity to those of human/civet SARSr-CoVs, we hypothesize that the ancestor 
of civet SARSr-CoVs has acquired its ORF8 from SARSr-Rf-BatCoVs through 
recombination between SARSr-Rf-BatCoVs from greater horseshoe bats and SARSr-Rs- 
BatCoVs from Chinese horseshoe bats. When civet SARSr-CoV SZ3 was used as the 
query for sliding window analysis with SARSr-Rf-BatCoV YNLF_31C and SARSr-Rs- 
BatCoV Rs3367 and HKU3 as potential parents, several recombination breakpoints were 
observed. In particular, two breakpoints, between which ORF8 was located, were 
identified (Fig. 5). Downstream to the first breakpoint at position 27128 and upstream to 
the second breakpoint at position 28635, an abrupt change in clustering occurred with 
high bootstrap support for clustering of civet SARSr-CoV SZ3 with SARSr-Rf-BatCoV 
YNLF_31C. This is in line with results from phylogenetic and similarity plot analysis. 
Moreover, using multiple alignments, civet SARSr-CoV SZ3 was shown to possess much 


higher sequence similarities to SARSr-Rf-BatCoVs than to SARSr-Rs-BatCoVs within 
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ORF8 which includes the region corresponding to the 29-nt deletion found in human 
SARS-CoVs (Fig. 5). 

Besides ORF8, another region of interest was S which was situated between two 
breakpoints at position 20900 and 26100 respectively (Fig. 5). Downstream to position 
20900 and upstream to position 26100, an abrupt change in clustering occurred with high 
bootstrap support for clustering of civet SARSr-CoV SZ3 with SARSr-Rs-BatCoV 
Rs3367. This is in line with phylogenetic analysis and the ability of strain Rs3367 to use 
ACE2 as receptor for cellular entry (42). However, similarity plot analysis still showed 
substantial difference between the S of civet SARSr-CoV SZ3 and SARSr-Rs-BatCoV 
Rs3367, especially in the S1 region. 

Estimation of synonymous and non-synonymous substitution rates. Using all 
available SARSr-BatCoV genome sequences for analysis, the Ka/Ks ratios for various 
coding regions, as compared to those of civet SARSr-CoVs and human SARS-CoVs, are 
shown in Table 3. Notably, the Ka/Ks ratios for most coding regions of SARSr-BatCoVs, 
including ORF8 of SARS-Rf-BatCoVs, were low, supporting purifying selection. In 
contrast, many regions of civet SARSr-CoVs and human SARS-CoVs exhibited 
relatively high Ka/Ks ratios suggestive of positive selection. Positive selection was 
particularly strong at the S (Ka/Ks=3) and ORF3 (Ka/Ks=2) of civet SARSr-CoVs, and 
the M (Ka/Ks=2) and ORF8 (Ka/Ks=3.5) of human SARS-CoVs. 

Estimation of divergence dates. Using the uncorrelated relaxed clock model on 
ORF lab, the time of the most recent common ancestor ((MRCA) of all SARSr-CoVs was 
estimated to be 1960.1 [highest posterior density regions at 95% (HPD), 1899.1 to 


1988.6]. The tMRCA of human and civet SARSr-CoVs was estimated to be 2001.5 
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(HPDs, 1999.1 to 2002.5), approximately 2 years before the SARS epidemic. The 
tMRCA of human/civet SARSr-CoVs, SARSr-Rp-BatCoV Rp3/2004, and SARSr-Rs- 
BatCoV RsSHC014/2011, Rs3367/2012 and WIV1/2012 was estimated to be 1995.3 
(HPDs, 1984.5 to 2001), while that of human/civet SARSr-CoVs, and SARSr-Rf- 
BatCoVs, was estimated to be 1990.6 (HPDs, 1973.2 to 1999.6) (Fig. 6). 

Since some regions in ORF lab may be involved in recombination (Fig. 5), nsp5, 
which was free from recombination, was also used for analysis and showed similar tree 
topology. Using the uncorrelated relaxed clock model on nsp5, the time of the most 
recent common ancestor ((MRCA) of all SARSr-CoVs was estimated to be 1961.5 
[highest posterior density regions at 95% (HPD), 1898.9 to 1991.5]. The tMRCA of 
human and civet SARSr-CoVs was estimated to be 2000.7 (HPDs, 1996.7 to 2002.6), 
approximately 2 years before the SARS epidemic. The tMRCA of human/civet SARSr- 
CoVs, SARSr-Rp-BatCoV Rp3/2004, and SARSr-Rs-BatCoV RsSHC014/2011, 
Rs3367/2012 and WIV1/2012 was estimated to be 1996.3 (HPDs, 1985.2 to 2001.7), 
while that of human/civet SARSr-CoVs, and SARSr-Rf-BatCoVs, was estimated to be 
1989.9 (HPDs, 1969.6 to 2000.3) (Fig. 6) The estimated mean substitution rates of the 
ORF lab and nsp5 data set under the uncorrelated exponentially distributed relaxed clock 
model (UCED) were 2.00 x10° and 1.36 x10° substitution per site per year respectively, 
which are comparable to other CoVs and RNA viruses (55, 56). 

Expression of ORF8 and determination of leader-body junction sequence. 
CoVs are characterized by a unique mechanism of discontinuous transcription with the 
synthesis of a nested set of subgenomic mRNAs (1, 2). To determine if ORF8 is 


expressed in SARSr-Rf-BatCoV and the location of the leader and body TRS used for 
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mRNA synthesis, the leader-body junction sites and flanking sequences of ORF8 
subgenomic mRNA were determined. The obtained subgenomic mRNA sequence was 
aligned to the leader sequence which confirmed the core sequence of the TRS motifs as 
5’-ACGAAC-3’ (Fig. 7), as in other SARSr-CoVs. The leader TRS and the ORF8 
subgenomic mRNA exactly matched each other. The SARSr-Rf-BatCoV leader was 


confirmed as the first 66 nt(s) of the genome. 
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DISCUSSION 

The ORFS8 of civet SARSr-CoV is likely to have been acquired from SARSr-Rf-BatCoVs 
in greater horseshoe bats (R. ferrumequinum) through recombination. In this study, two 
SARSr-Rf-BatCoV strains, YNLF_31C and YNLF 34C, were identified from greater 
horseshoe bats. Although their genomes only possessed 93% nt identities to the genomes 
of human/civet SARSr-CoVs, which is lower than the 95% nt identities between 
human/civet SARSr-CoV and SARSr-Rs-BatCoVs, Rs3367 and RsSHC014, from 
Chinese horseshoe bats in Yunnan, the nsp3 and ORF8 of SARSr-Rf-BatCoV 
YNLF_31C and YNLF 34C exhibited the highest aa identities among all SARSr- 
BatCoVs to that of civet SARSr-CoV SZ3. In particular, their ORF8 demonstrated much 
higher aa identities (81.3%) to civet SARSr-CoV SZ3 than SARSr-BatCoVs from other 
horseshoe bats (23.2% to 37.3%). Phylogenetic analysis of the ORF8 revealed a distinct 
clade formed by human/civet SARSr-CoVs and SARSr-Rf-BatCoVs separate from other 
SARSr-BatCoVs. This is in line with a previous report showing that the ORF8 of SARSr- 
Rf-BatCoV Rfl was clustered with human/civet SARSr-CoVs but not SARSr-BatCoV 
Rm1 and Rp3 upon phylogenetic analysis, although only one SARSr-Rf-BatCoV strain 
was available for analysis (28). Moreover, potential recombination sites were identified 
between SARSr-Rf-BatCoVs and SARSr-Rs-BatCoVs around the ORF8 region, leading 
to the generation of civet SARSr-CoV SZ3 with the ORF8 acquired from SARSr-Rf- 
BatCoVs. Similar to other regions of the genome, the ORF8 of SARSr-Rf-BatCoVs has 
been under purifying selection, which supports greater horseshoe bats as a reservoir for 
SARSr-Rf-BatCoVs. In contrast, the ORF8 of human SARS-CoVs was under strong 


positive selection, which reflects the rapid evolution soon after interspecies jumping. 
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These findings supported that recombination is the key mechanism involved in the 
acquisition of ORF8 by the ancestor of civet SARSr-CoVs. In fact, previous studies have 
demonstrated frequent recombination events between SARSr-Rs-BatCoV strains from 
different bat species of different geographical locations in China (22, 55). Moreover, a 
recombination breakpoint at nsp16/S intergenic region was detected between SARSr-Rp- 
BatCoV Rp3 from Pearson’s horseshoe bats (Rhinolophus pearsoni) and SARSr-Rf- 
BatCoV Rf] during the evolution of SARSr-BatCoVs to civet SARSr-CoV (22). On the 
other hand, some genomic regions of SARSr-Rf-BatCoV YNLF 31C and YNLF_34C, 
such as nsp3, RdRp and N, were evolutionarily distinct from two previously reported 
SARSr-Rf-BatCoV strains, Rfl and 273/2005, upon phylogenetic analysis. This suggests 
that SARSr-Rf-BatCoVs from different geographical locations in China may have 
evolved separately through other recombination events. The present findings offer new 
insights into the origin and evolution of SARS-CoV, by showing that the ancestor of civet 
SARSr-CoV is a likely recombinant virus with ORF8 originated from SARSr-Rf- 
BatCoVs in greater horseshoe bats and other genome regions from different horseshoe 
bats. 

Although SARSr-Rs-BatCoV Rs3367 and RsSHCO14 represented the closest bat 
CoVs to SARS-CoV in terms of genome identity, they were unlikely the immediate 
ancestor of civet SARSr-CoVs. Previous molecular-dating studies estimated that the time 
of divergence between human/civet and bat SARSr-CoVs ranged from 4 to 17 years 
before the SARS epidemic (22, 55, 57). SARSr-CoVs were also shown to be a newly 
emerged subgroup of Betacoronavirus, with the median date of their MRCA estimated to 


be from 1961 to 1982 (55, 57). The present results are in line with such estimations, with 
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the tMRCA between human/civet and closest bat strains estimated to be approximately 
1995 (8 years before the SARS epidemic) and that among all SARSr-CoVs 
approximately 1960 using ORF lab. Similar results were also obtained when using nsp5 
region which was recombination-free. Moreover, we demonstrated that SARSs-Rf- 
BatCoV YNLF_31C and YNLF 34C only diverged from civet/human SARSr-CoVs at 
approximately 1990. This is in contrast to previous studies that showed SARSr-Rp- 
BatCoV Rp3 as the only recently diverged strain (55, 57). Together with the evidence on 
the acquisition of ORF8, it is likely that civet SARSr-CoV is originated from 
recombination between SARS-Rs-BatCoVs and SARS-Rf-BatCoVs from different 
horseshoe bat species within several years before the SARS epidemic. 

The overlapping habitat and geographical distribution of different horseshoe bats 
may have fostered recombination between different SARSr-BatCoVs and emergence of 
SARS-CoV. Chinese horseshoe bats are widely distributed throughout China including 
Yunnan, Guangdong and Hong Kong. While greater horseshoe bats are also widely 
distributed across different provinces in China including Yunnan, they are not found in 
Guangdong (58). The two bat species shared similar diet and habits such as the ability to 
roost in man-made structures, suggesting that they may co-habitat in similar 
environments in Yunnan, the province with the highest biodiversity in China. In fact, 
SARSr-Rf-BatCoV YNLF_31C and YNLF_34C, and SARSr-Rs-BatCoV Rs3367 and 
RsSHC014 were detected in Lufeng and Kunming of the Yunnan province respectively, 
which were only ~80 km apart and within the migration distances of horseshoe bats (Fig. 
1) (22, 59, 60). Since greater horseshoe bats are not found in Guangdong, recombination 


between SARSr-Rf-BatCoVs and SARS-Rs-BatCoVs with the generation of the ancestor 
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of civet SARSr-CoVs may have occurred in yet unidentified bats in Yunnan or nearby 
provinces, which were then transported to wildlife markets in Guangdong and infected 
civets. Alternatively, recombination may have occurred in civets or other animals within 
wildlife farms or markets where many different wild animal species are often housed 
together (61). A possible scenario is that the animals were co-infected with SARSr-Rf- 
BatCoVs and SARSr-Rs-BatCoVs from different horseshoe bats, followed by 
recombination events. More extensive surveillance in bats from Yunnan and neighboring 
provinces, as well as wildlife markets in Guangdong may reveal the immediate ancestor 
of civet SARSr-CoVs. 

The ORF8 region, unique to SARSr-CoVs, is prone to mutations or deletions 
during interspecies transmission. One of the most striking genomic changes observed in 
SARS-CoV soon after its zoonotic transmission to humans was the acquisition of a 
characteristic 29-nt deletion which splits ORF8 into two ORFs, ORF8a and ORF8b (25, 
62). While SARS-CoVs isolated from the later human cases of the epidemic contained 
this 29-nt deletion, isolates from civets and some early human cases possessed a single 
continuous ORF8 (25, 63). Besides, some early human strains and a farmed civet strain 
from Hubei possessed an alternative 82-nt deletion in ORF8 (63). On the other hand, four 
late human isolates possessed a 415-nt deletion, resulting in the loss of the entire ORF8 
(63). Although studies using reverse genetics showed that the ORF8 is not essential for 
virus replication in vitro and in vivo (64, 65), the full-length 8ab protein is a functional 
protein that is delivered by a cleavable signal sequence to the lumen of the endoplasmic 
reticulum where it becomes N-glyosylated (62). Different subcellular localizations and 


functions have also been reported for 8ab, 8a and 8b proteins (66-69). Inside the 
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endoplasmic reticulum, 8ab activates the ATF6 branch of unfolded-protein response (70). 
The 8a protein enhances SARS-CoV replication and induces caspase-dependent apoptosis 
through a mitochondria-dependent pathway (66). Moreover, antibodies against 8a protein 
have been detected in sera of SARS patients (66). The 8b protein down-regulates the 
expression of the E protein, which supported a modulatory role in viral replication (68). 
Moreover, overexpression of the 8b protein induces DNA synthesis (67). The 8b and 8ab 
proteins also play a role in the host ubiquitin-proteasome system (71). In this study, the 
expression of ORF8 subgenomic mRNA in SARSr-Rf-BatCoV YNLF_31C suggested 
that this protein may also be functional in SARSr-BatCoVs. Moreover, the high Ka/Ks 
ratio among human SARS-CoVs compared to SARSr-BatCoVs supported that ORF8 is 
subject to rapid evolution under strong positive selection during animal-to-human 
transmission. Further studies may help understand the importance of ORF8 evolution for 
interspecies transmission of SARSr-CoVs. 

Besides SARSr-BatCoVs, diverse alphacoronaviruses and betacoronaviruses, 
including potentially novel CoVs, with potential interspecies transmission events were 
identified in this study. Bats are known important reservoirs of lineage B, C and D 
betacoronaviruses, while rodents are likely the reservoir of lineage A betacoronaviruses 
(30). Nine samples belonging to lineage D betacoronaviruses were detected in 
Leschenault’s rosettes (R. leschenaulti), a known reservoir of Ro-BatCoV HKU9 (24). 
However, the partial RdRp sequences only possessed 75-79% nt sequences to the latter, 
suggesting that they may represent either novel CoV species or novel genotype of Ro- 
BatCoV HKU9. As for alphacoronaviruses, 24 samples from Daubenton’s bats (M. 


daubentonii) contained viruses most closely related to My-BatCoV HKU6 with 78-99% 
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nt identities in the partial RdRp region, which may represent My-BatCoV HKU6 or 
related viruses previously reported in the same bat species (38). Six samples contained 
alphacoronaviruses most closely related to Rh-BatCoV HKU2. However, four samples 
(YNXY_7C, YNXY_10C, YNXY_45 and YNXY_50C) from Daubenton’s bats 
possessed partial RdRp sequences of only 80-80% nt identities to that of Rh-BatCoV 
HKU2, suggesting that they may represent novel CoVs. Although the other two samples 
(MJ_27C and MJ_69C) possessed RdRp sequences with 92-93% identities to that of Rh- 
BatCoV HKU2, they were detected from Daubenton’s bats and lesser brown horseshoe 
bats (R. stheno) instead of Chinese horseshoe bats (R. sinicus) previously reported to 
carry Rh-BatCoV HKU2 (34). This may suggest interspecies transmission of Rh-BatCoV 
HKU2 among different bat species. Two samples from Pomona roundleaf bats 
(Hipposideros Pomona) contained alphacoronaviruses most closely related to Hi- 
BatCoV HKU10. However, the partial RdRp sequences only possessed 81-87% nt 
identity to the latter. We have previously described recent interspecies transmission of 
BatCoV HKUI10 between Leschenault’s rousettes (R. leschenaulti) and Pomona 
roundleaf bats, two very different bats belonging to different families, through rapid 
evolution of the S protein (72). Further studies are warranted to determine if the two 
samples from Pomona roundleaf bats contained potentially novel CoVs closely related to 


BatCoV HKU10 or variants of BatCoV HKU10 due to interspecies transmission. 
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LEGENDS TO FIGURES 

FIG 1 Map showing five locations of bat sampling in four autonomous prefectures (AP) in 
Yunnan Province, China. Sampling locations in Yunnan are in red. The location of SARSr-Rs- 
BatCoV strains, Rs3367 and RsSHC014, detected in a previous study (42) is in blue. 

FIG 2 Phylogenetic analysis of the nt sequences of the 267-nt fragment of RdRp of the 46 
positive samples identified in bats in Yunnan in this study. The tree was constructed by 
maximum likelihood method with the model GTR+G. Bootstrap values were calculated from 
1000 trees and only values >700 are shown and given at nodes. The scale bar indicates 5 nt 
substitutions per site. The two SARSr-Rf-BatCoV strains YNLF_31C and YNLF_ 34C are in 
red. The potentially novel bat CoVs are in purple. AntelopeCoV, sable antelope coronavirus 
(EF424621); BatCoV CDPHE15/USA/2006, Bat coronavirus CDPHE15/USA/2006 
(NC_022103.1); BatCoV/SC2013, Betacoronavirus/SC2013 (KJ473821.1); | Erinaceus 
CoV/VMC/DEU/2012,Betacoronavirus Erinaceus/VMC/DEU/2012(NC_022643); BCoV, bovi 

ne coronavirus (NC_003045); BAHKU22, bottlenose dolphin coronavirus HKU22 (KF793826); 
BuCoV HKU11, bulbul coronavirus HKU11 (FJ376619); BWCoV SWI, beluga whale 
coronavirus SWI (NC_010646); CCoV, Canine coronavirus strain CCoV/NTU336/F/2008 
(GQ477367.1); CCRCoV, Canine respiratory coronavirus strain K37 (JX860640.1); CmCoV 
HKU21, common moorhen coronavirus HKU21 (NC_016996);CoV Neoromicia/PML- 
PHE1/RSA/2011, coronavirus Neoromicia/PML-PHE1/RSA/2011 (KC869678); DcCoV 
HKU23,dromedary camel coronavirus HKU23 (KF906251); ECoV, equine coronavirus 
(NC_010327); FIPV, feline infectious peritonitis virus (AY994055); GiCoV, Giraffe 
coronavirus US/OH3-TC/2006 (EF424622.1); HCoV-229E, human coronavirus 229E 


(NC_002645); HCoV-HKUI1, human coronavirus HKU1 (NC_006577); HCoV-NL63, human 
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coronavirus NL63 (NC_005831);HCoV-OC43, human coronavirus OC43(NC_005147); Hi- 
batCoV HKU10, Hipposideros bat coronavirus HKU10 (JQ989269);IBV-beaudette, beaudette 
coronavirus (AY692454); Human MERS-CoV, middle east respiratory syndrome 
coronavirus(NC_019843.3); Human MERS-CoV EMC/2012, Human _betacoronavirus 
2c EMC/2012 (JX869059.2); Camel MERS-CoV KSA-CAMEL-363, middle east respiratory 
syndrome coronavirus isolate KSA-CAMEL-363 (KJ713298); MRCoV HKU18,magpie robin 
coronavirus HKU18(NC_016993); BatCoV 1A, Miniopterus bat coronavirus 1A (NC_010437); 
BatCoV 1B,Miniopterus bat coronavirus 1B(NC_010436); Mi-batCoV HKU7, Miniopterus bat 
coronavirus HKU7 (DQ249226); Mi-batCoV HKU8, Miniopterus bat coronavirus HKU8 
(NC_010438); Mink CoV strain WD1127, Mink coronavirus strain WD1127 (NC_023760.1); 
MunCoV HKU13, munia coronavirus HKU13 (FJ376622);MHV-A59, murine hepatitis 
virus(NC_001846); My-batCoV HKU6, Myotis bat coronavirus HKU6 (DQ249224); NH CoV 
HKU19,night heron coronavirus HKU19 (NC_016994);PEDV, porcine epidemic diarrhoea 
virus (NC_003436);  PHEV,porcine haemagglutinating encephalomyelitis virus 
(NC_007732);Pi-BatCoV-HKUS-1, Pipistrellus bat coronavirus HKU5 (NC_009020); PorCoV 
HKUI1S5, porcine coronavirus HKU15 (NC_016990); PRCV, porcine respiratory coronavirus 
(DQ811787); RbCoV HKU14, rabbit coronavirus HKU14 (NC_017083); RatCoV parker, rat 
coronavirus parker(NC_012936); Rs-batCoV HKU2, Rhinolophus bat coronavirus HKU2 
(EF203064); Ro-batCoV-HKU9, Rousettus bat coronavirus HKU9 (NC_009021); Ro-batCoV 
HKU10, Rousettus bat coronavirus HKU10 (JQ989270);Human SARS-CoV TOR2, SARS- 
related human coronavirus(NC_004718); Civet SARS-CoV SZ16, SARS-related palm civet 
coronavirus (AY304488); Badger SARS-CoV, SARS-related badger coronavirus 


CFB/SZ/94/03 (AY545919.1);  SARSr-Rs-batCoV HKU3, SARS-related Rhinolophus bat 
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coronavirus HKU3 (DQ022305); Scotophilus BatCoV 512,Scotophilus bat coronavirus 512 
(NC_009657); SpCoV HKU17, sparrow coronavirus HKU17 (NC_016992); TCoV, turkey 
coronavirus(NC_010800); TGEV, transmissible gastroenteritis virus (DQ443743); ThCoV 
HKU12, thrush coronavirus HKU12 (FJ376621);Ty-BatCoV-HKU4-1, Tylonycteris bat 
coronavirus HKU4 (NC_009019);WECoV HKUI16, white-eye coronavirus HKU16 
(NC_016991);WiCoV HKU20, wigeon coronavirus HKU20 (NC_016995). 

FIG 3 Multiple alignment of the amino acid sequences of the receptor-binding motifs of the 
spike proteins of human and civet SARSr-CoV and the corresponding sequences of SARSr- 
BatCoVs in different Rhinolophus species. Asterisks indicate positions that have fully 
conserved residues. Amino acid deletions among some SARSr-BatCoVs are highlighted yellow. 
The five critical residues for receptor binding in human SARS-CoV, at positions 
442,472,479,487,491, are highlighted pink. 

FIG 4 Phylogenetic analyses of nsp2, nsp3, nsp5, RdRp, S, ORF3, ORF8 and N nucleotide 
sequences of SARSr-BatCoVs from different bat species. The trees were constructed by the 
maximum likelihood method using (A) GTR+G; (B) GTR+G; (C) GTR+G+I; (D) TN93+G; (E) 
GTR+G; (F) TN93+G (G) T92 +G (H) GTR+G substitution models respectively and bootstrap 
values calculated from 1000 trees. Except for ORF3 and ORF8, all trees were rooted using 
corresponding sequences of HCoV HKU1 (GenBank accession number NC_006577). Only 
bootstrap values >70% are shown. (A) 1736 nt (B) 5019 nt (C) 908 nt (D) 2777 nt (E) 3638 nt 
(F) 804 nt (G) 345 nt (A) 1222 nt positions respectively were included in the analyses. The 
scale bars represent (A) 50 (B) 10 (C) 20 (D) 20 (E) 10 (F) 20 (zG) 10 (A) 200 substitutions per 


site respectively. Human and civet SARSr-CoVs are in green, SARSr-Rs-BatCoVs from R. 
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sinicus are in blue and SARSr-Rs-BatCoVs from R. ferrumequinum are in red. The two SARSr- 
Rf-BatCoV strains YNLF_31C and YNLF_34C detected in this study are bolded. 

FIG 5 (A) Bootscan (upper panel) and Simplot (lower panel) analysis using the genome 
sequence of civet SARSr-CoV strain SZ03 as the query sequence. Bootscanning was conducted 
with Simplot version 3.5.1 (F84 model; window size, 1000 bp; step, 200 bp) on a gapless nt 
alignment, generated with ClustalX. The red line denotes SARSr-Rf-BatCoV strain YNLF_31C, 
the blue line denotes SARSr-Rs-BatCoV strain Rs3367 and the black line denotes SARSr-Rs- 
BatCoV strain HKU3-1. The ORF8 region with potential recombination is highlighted yellow. 
(B) Multiple alignment of nt sequences from genome position 27000 to 28700. Bases conserved 
between civet SARSr-CoV SZ03 and SARSr-Rf-BatCoVs (strains YNLF_31C and Rf1) are 
marked in yellow boxes. Bases conserved between civet SARSr-CoV SZ03 and SARSr-Rs- 
BatCoVs (strains Rs3367 and HKU3-1) are marked in green boxes. The 29-nt deletion in 
human SARS coronavirus TOR2 is highlighted orange The start codon and stop codon of ORF8 
are labelled with black boxes. 

FIG 6 Estimation of tMRCA of SARSr-CoVs based on ORF lab (A) and nsp5 (B). The mean 
estimated dates were labeled. The taxa were labeled with their sampling dates. 

FIG 7 SARSr-Rf-BatCoV YNLF31C mRNA leader-body junction and flanking sequences. The 
subgenomic ORF8 mRNA sequences are shown in alignment with the leader and the genomic 
sequence. The start codon AUG in subgenomic RNA is depicted in red. The putative TRS is 
depicted in boldface type and underlined. Identical bases between leader sequence and 
subgenomic mRNA sequence are in blue. Identical bases between genome and subgenomic 


mRNA sequences are in green. 
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898 Table 1. Detection of CoVs in different bat species by RT-PCR of the 440-bp fragment of RdRp gene 
899 a ; : 
Scientific name Common name No. of bats No. of bats CoV detected/closest Nt Sampling 
tested positive forCoV — match in GenBank identity to closest location of 
match (%) positive bats 
Rhinolophus luctus Woolly horseshoe 32 0 - - - 
bat 
Rhinolophus affinis Intermediate 22 0 - - - 
horseshoe bat 
Rhinolophus ferrumequinum Greater horseshoe 11 2 SARS-CoV (2) 100 Lufeng 
bat 
Rhinolophus stheno Lesser brown 34 1 Rs-BatCoV HKU2 (1) 92 Mojiang 
horseshoe bat 
B 
me) Hipposideros pomona Pomona roundleaf 17 2 Hi-BatCoV HKU10 (2) 81-87 Mojiang 
ie) bat 
a Myotis daubentonii Daubenton’s bat 98 32 My-BatCoV HKU6 (24) 78-99 Xiangyun 
= Rs-BatCoV HKU2(1) 93 Mojiang 
5 Rs-BatCoV HKU2 (4) 80-81 Xiangyun 
2 Mi-BatCoV HKU7(2) 96 Mojiang 
Mi-BatCoV HKU8 (1) 96 Mojiang 
Rousettus leschenaulti Leschenault’s 115 9 Ro-BatCoV HKU9 (9) 75-79 Mengla 
rousette 
Unknown bat species 19 0 - - - 
45 


900 Table 2. Percentage amino acid identities of the selected predicted gene products of SARSr-CoVs to civet SARSr-CoV strain SZ3 
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901 
nsp2 nsp3_snsp5__nsp12 S  ORF3 E M ORF8&* N 
Civet SARSr-CoV civet007 99.5 995 100.0 99.7 98.6 98.1 100.0 100.0 98.3 902 
Civet SARSr-CoV SZ16 100.0 99.9 100.0 99.9 99.9 100.0 100.0 100.0 98.3 100.0 
Human SARS-CoV BJO1 99.8 99.6 100.0 99.9 988 98.1 100.0 995 38.2 283 
Human SARS-CoV GZ02 99.8 998 100.0 99.9 99.0 978 100.0 995 983 Hyg 
Human SARS-CoV Tor2 99.8 99.6 100.0 99.9 986 98.1 100.0 995 37.3 100.0 
SARSr-Rs-BatCoV Rs3367 978 968 100.0 99.6 92.3 96.7 99.1 97.7 32.2 90S 
SARSr-Rs-BatCoV RsSHC014 98.3 96.8 99.7 996 90.1 96.7 99.1 97.7 33.0 99.5 
SARSr-Rs-BatCoV WIV1 978 968 99.7 995 923 96.3 996 97.7 32.2 96 
SARSr-Rs-BatCoV HKU3-1 90.6 91.7 993 986 77.9 813 974 98.2 314 Sy 
SARSr-Rs-BatCoV HKU3-2 90.6 91.7 993 986 778 813 965 98.2 314 96.7 
SARSr-Rs-BatCoV HKU3-3 90.6 91.7 993 986 77.9 81.3 961 982 314 908 
> 
rep) SARSr-Rs-BatCoV HKU3-6 90.6 917 993 985 780 813 974 982 314 964 
(e) 
ae) SARSr-Rs-BatCoV HKU3-8 90.0 91.7 99.0 988 78.1 81.7 974 964 23.2 309 
—_ 
> SARSr-Rs-BatCoV HKU3-12 90.4 91.7 993 989 71 81.7 974 982 314 X10 
Oo SARSr-Rs-BatCoV HKU3-13 90.6 91.2 993 986 780 81.0 974 982 314 964 
ej SARSr-Rs-BatCoV Rs672/2006 98.3 87.1 993 99.7 78.0 894 98.7 982 322 Bla 
= 
5 SARSr-Rb-BatCoV BM48-31/BGR 70.8 75.9 94.4 97.7 74.8 694 96.5 89.4 87.2 
=O SARSr-Rm-BatCoV 279/2005 896 903 99.7 99.1 786 $32 974 968 31.7 Sele 
SARSr-Rm-BatCoV Rml 89.5 90.0 993 924 78.7 83.2 978 968 33.0 ve 
SARSr-Rp-BatCoV Rp3 96.7 95.1 99.7 928 784 83.2 996 968 33.0 97.9 


SARSr-Rp-BatCoV Rp/Shaanxi2011 93.6 = 93.0 100.0 92.3 79.0 82.1 90.0 964 33.0 O74 


SARSr-Cp-BatCoV Cp/Yunnan2011 90.8 97.5 100.0 92.2 78.9 89.4 97.0 98.6 31.4 98.1 


SARSr-Rf-BatCoV Rfl 90.1 92.0 99.7 91.6 76.5 85.7 96.1 973 80.4 HS 
SARSr-Rf-BatCoV 273/2005 89.8 92.3 99.7 98.4 76.6 85.7 98.7 97.3 80.4 86 
SARSr-Rf-BatCoV YNLF_31C 95.0 97.1 99.7 99.5 77.3 86.8 974 98.2 81.3 98.1 

SARSr-Rf-BatCoV YNLF_34C 95.0 97.1 99.7 99.0 77.3 86.8 99.1 98.2 813 O97 


918 “The high amino acid identities in nsp3 and ORF8 between SARSr-Rf-BatCoVs and civet SARSr-CoV are in bold. 


46 


>= 

oD 
me} 
i) 
= 
> 
sg 
fe) 
o 
= 
(= 
=] 
fe) 
© 


Accepted Manuscript Posted Online 


919 Table 3. Non-synonymous and synonymous substitution rates in the coding regions of SARSr-CoVs among different hosts 


SARSr-Rf-BatCoV SARSr-Rs-BatCoV Civet SARSr-CoV Human SARS-CoV 
(n=4) (n=17) (n=18) (n=122) 

Gene Ka Ks Ka/Ks gene Ka Ks Ka/Ks gene Ka Ks Ka/Ks* gene Ka Ks Ka/Ks 

nsp1 0.013 0.081 0.161 nspl 0.003 0.108 0.028 nsp1 0.000 0.000 a nsp1l 0.000 0.000 - 
nsp2 0.036 0.349 0.103 nsp2 0.023 0.230 0.100 nsp2 0.001 0.003 0.333 nsp2 0.000 0.00 0.000 
nsp3 0.030 0.414 0.073 nsp3 0.018 0.288 0.063 nsp3 0.001 0.002 0.500 nsp3 0.004 0.005 0.800 
nsp4—s- 0.012, 0.391 ~—:0.031 nsp4 0.010 0.222 0.045 nsp4 0.001 0.002 0.500 nsp4 0.002 0.002 1.000 
nsp5 0.003 0.442 0.007 nsp5 0.004 0.244 0.016 nsp5 0.001 0.000 i nsp5 0.000 0.00 0.000 
nsp6 0.009 0.331 ~=—-0.027 nsp6 0.005 0.178 0.028 nsp6 0.000 0.002 0.000 nsp6 0.002 0.00 2.000 
nsp7 0.018 0.549 0.033 nsp7 0.000 0.181 0.000 nsp7 0.002 0.000 = nsp7 0.000 0.00 0.000 

nsp8 0.004 0.249 0.016 nsp8 0.003 0.175 0.017 nsp8 0.001 0.000 = nsp8 0.000 0.000 = 

nsp9 0.000 0.199 0.000 nsp9 0.003 0.199 0.015 nsp9 0.001 0.000 7 nsp9 0.001 0.000 = 
nspl10 0.011 0.355 0.031 nsp10 0.000 0.158 0.000 nspl10 0.000 0.000 7 nspl0 0.002 0.002 1.000 
n~ nspl2 0.038 0.109 0.349 nsp12 0.026 0.076 0.342 nsp12. 0.000 0.003 0 nspl12. 0.001 0.00 1.000 
2a nsp13 0.002 0.347 0.006 nspl3_ 0.002 0.199 0.010 nsp13 0.000 0.003 0 nsp13_ 0.001 ~—0.00 1.000 
me) nspl4. 0.006 0.485 0.012 nspl4 0.005. 0.270 0.019 nspl4 0.001 0.003 0.333 nspl4_ 0.001 (0.00 1.000 

S nsp15 0.016 0.452 0.035 nsp15 0.012 0.275 — 0.044 nsp15 0.000 0.000 = nsp15 0.000 0.00 0 
a nspl6 0.008 0.306 0.026 nspl6 0.005 0.277 0.018 nspl6 0.002 0.002 1.000 nspl6 0.002 0.003 0.667 
g Ss 0.012 0.174 0.070 Ss 0.049 0.412 0.119 Ss 0.003 0.001 3.000 Ss 0.001 0.002 0.500 
Z ORF3 0.012 0.065 0.185 ORF3 0.041 0.220 0.186 ORF3 0.002 0.001 2.000 ORF3 0.072 0.386 0.187 
5 E 0.015 0.070 0.214 E 0.003 0.037 0.081 E 0.000 0.000 = E 0.001 0.002 0.500 
2 M 0.003 0.096 0.313 M 0.007 0.097 0.072 M 0.001 0.002 0.500 M 0.002 0.001 2.000 
ORF8 0.021 0.110 0.190 ORF8 0.035 0.197 0.178 ORF8” 0.004 0.000 = ORF8° 0.007 0.002 3.500 

N 0.015 0.143 0.105 N 0.008 0.069 0.116 N 0.002 0.005 0.400 N 0.000 0.001 0 


920 “Ka/Ks ratios of >0.5 are in bold. 
921 Only ORF8 sequences without deletions were included in analysis. 
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leader ACCUCGAUCUCUUGUAGAUCUGUUCUUUAAACGAACUUUAAAAUCUGUGUGGCU, 
ORF8 mRNAACCUCGAUCUCUUGUAGAUCUGUUCUUUAAACGAACAUGAAACUUCUCAUUGUU 


genome UAUAGAAGAACCUUGUAACAAAGUCUAAACGAACAUGAAACUUCUCAUUGUU 
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